VAST 2012 Challenge
Mini-Challenge 1:

 

 

Team Members:

Le Duan, University of Konstanz, duan.le@uni-konstanz.de PRIMARY

Serkan Duman, University of Konstanz, serkan.duman@uni-konstanz.de

Yao Zhang, University of Konstanz, yao.zhang@uni-konstanz.de

 

Student Team: Yes.

 

Tool(s):

Rendering the result:Processing, developed by Ben Fry and Casey Reas in 2001,MIT Media Lab [http://processing.org] .

Data storage: MySQL[www.mysql.com].

Main programming IDE:Eclipse [http://www.eclipse.org].

Rendering the result(pie chart):Matlab [http://www.mathworks.co.uk/products/matlab/index.html].

Data analysis:Tableau [www.tableausoftware.com].

 

 

Video:

 

VAST 2012 Challenge 

 

 

Answers to Mini-Challenge 1 Questions:

 

MC 1.1 Create a visualization of the health and policy status of the entire Bank of Money enterprise as of 2 pm BMT (BankWorld Mean Time) on February 2. What areas of concern do you observe? 

In Fig 1 we show an overview of the healthy status of the entire Bank of Money enterprise as of 2 pm BMT on February 2. We define both activity flag and policy status are equal to 1 is healthy and otherwise are unhealthy. Since this definition provide only the lower bound of healthy condition, we may also be aware that the amount of healthy computer can be larger than this amount. Obviously most of machines are healthy, and only no more than a quarter of total computers are unhealthy, which derives the deeper insight of the data. From the following matrix visualization we may observe that about 12 regions have policystatus deviations 1 and 2, while other regions have all policy deviations.

Beschreibung: Beschreibung: Overview0

Fig 1. Overview of healthy status for 51 regions.


Fig 2 is a detailed view of policyStatus changing for 51 regions. For all the regions most of the machines were working normally(the amount for activity flag 1 is at least 100 times of those for other flags), the more serious the policy status is, the lower the number of the machines were in that case(decrease exponentially because we use logarithm normalization). Also the amount of machines that go for maintenance is the smallest because it's not a regular schedule as suggested in business rule 3.

Beschreibung: Beschreibung: 22

Fig 2. A detail view of policyStatus changing for 51 regions.

 

Several regions have problems on policy status 2,activity flag 5, but never fix it(farther suggested in question 2). In region 4,6,11,14,24,29,32,... the activity(flag 5) to add a thumb device or a DVD device was considered as the moderate policy deviation. While in region 14-17,20-23, this activity (policy 5)was rare. But in head quarters, large region 1-10, additional devices bringing virus infection problems also happen to many machines.

 

MC 1.2  Use your visualization tools to look at how the network’s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin and end? What might be an explanation of each anomaly?

 

The figures below illustrate the trend of the changing of the policy status and activity flag.


How we get it?

1.First we use SQL to inner-join the metaStatus and meta-3-7 accrording to IP-address and then we obtained a new table which contains the businessUnit,policyStatus, activityFlag, healthtime and the number of machines of that status, which named query2.csv. The final reuslts are produced by Tableu and Processing.
2. A short caption for the "matrix visulization"
   2.1. For the whole "matrix visulization", the x-axis is the time and the y-axis is the name of the regions. For the medium rectangles, the x-axis is the policyStatus from 1 to 5 and the y-axis is the    activityFlag from 1 to 5, repectively.

   2.2. The normalization and color saturation strategy
   For a single row of the matrix, first read in all the status data of that region, for each combination of policyStatus and activityFlag, e.g. (1,1),(1,2) and so on, a method is called to get the maximum    and minimum value for the whole time period of that status. There are 25 such combinations, so the method is called 25 times for a single region. After the 25 pairs of maximum and minimum values    are obtained, color can be assigned to each small rectangle according to the square root normalization.

   Supplementary content for color saturation
   Generally, if the cell is dark,this means the value of the corresponding combination value is high,but if the values for that combination do not vary a lot, then the color may always be dark.

 

Beschreibung: Beschreibung: Overview0

Beschreibung: Beschreibung: Overview0

Beschreibung: Beschreibung: Overview0

Fig 3,4,5. A matrix representation of the changes from time to time of all the regions.

Finding 1: Black belts in the first afternoon and second afternoon
(First afternoon,February 2nd)The duration of the trouble anomaly is about 11 hours for every region except head quarters, region 5 and 10, and for head quarters this duration is 16 hours, for region 5 and 10 it last for all day and haven't been fixed, which may suggest these three facts: The attacks always concentrated around 17:00(starts 5 hours before this time and end 5 hours after this time), which is the time that the bank is closed. Therefore it is quite possible that more activities are being processed by people on bank machines after 14:00 every workday and less monitoring are being done on these machines because quite a lot of essential business and work load for bank staff.
According to the policy in the bank system, the bank supervisors and IT support are suppose to fix a problem within 12 hours. Thus usually no serious problem should maintain more than 12 hours. But in head quarters due to the large work load they need more time to fix same problems or problems may become complicated and difficult to handle.
(Second afternoon,February 2nd)In this time interval almost black matrix last for 11 hours at around 14:00 to 24:00 in all regions.In the beginning of the February 3rd, generally all machines become not very normal but active for which can be concluded by more occurrences on activity flag 2,3,and 4, which lead to a serious situation in healthy status in the afternoon. The duration of policy serious deviation period follows the same trend as in the first evening(Feb,2nd), but more complicated due to more machine activities(such as maintenance, additional devices). Therefore we can see black block in most entries of the matrix. The dark block pattern may be a result of the transformation from horizontal bar to vertical bar.

Finding 2:Black horizontal bars(first 2 rows) change to black vertical bars(first 2 columns).
In all regions, we can see black horizontal bars(first 2 rows) change to black vertical bars(first 2 columns), which suggested that the first evening policy status is violated, and then the second evening activity flag is violated. The black horizontal bar may be suggest that the machines are working normally and are dealing with all kinds of activities from healthy ones to critical ones. The vertical bar suggest that the machines are not in normal status, and they are may go for maintenance or in other troubles(activity flag 3,4,5).

Finding 3:A continuous black bar in region 5 and 10.
It is easy to see that both region 5 and region 10 have a black bar on the top, as showing in the figures below. The reason for this is that the numbers of host with activity flag 1 didn't change a lot during the period, which means most of the machines were working normally. Although some other areas also have deep color, the values in those entries are low, this is caused by the cell-based normallization, we find out the maximum and minimum value of the same position of a row of medium rectangles and assign color to each of the cells according to the ratio. After viewing on the original data, we found there were no machines of poclicy status equal to 1 in region-5 and region-10. The conclusion is that not a single host ever complies to the policies (policystatus = 1). The following sheet is the evidence of this issue, which is the time series of total counts in region 5 and 10 of all activity flags(compared in rows) and policy status(compared in policystatus). We can clear see that the value in column 1(policy status 1) is empty.

Finding 4: Local regular pattern in region 4,6,11,14,24,29,32(single cell),region 17,19,21,22,25,26,27(pattern),....
Several regions have problems on policy status 2,activity flag 5, but never fix it. The black row in region-5 comes from empty values for the first matrix row. The conclusion is that not a single host ever complies to the policies (policystatus = 1). Region 5 and 10 started the deviation activities of policies very early, at 9 am on the first morning. The machines are working normally and are dealing with all kinds of activities from healthy ones to critical ones, and successfully rejected all the threats. In region 4,6,11 and 14 the activity(flag 5) to add a thumb device or a DVD device was considered as the moderate policy deviation.But in region 12,15,17,19,21, this activity (policy 5)was also bringing virus infection problems.
The amount of machines with unique activity and policy problems doesn't change a lot, which means there are always certain machines with such problems and the others are working normally.

Beschreibung: Beschreibung: Overview0

Beschreibung: Beschreibung: Overview0

Fig 6,7. A close view of region-5 and region-10.